Unzipped genome assemblies of polyploid root-knot nematodes reveal unusual and clade-specific telomeric repeats

Using long-read sequencing, we assembled and unzipped the polyploid genomes of Meloidogyne incognita, M. javanica and M. arenaria, three of the most devastating plant-parasitic nematodes. We found the canonical nematode telomeric repeat to be missing in these and other Meloidogyne genomes. In addition, we find no evidence for the enzyme telomerase or for orthologs of C. elegans telomere-associated proteins, suggesting alternative lengthening of telomeres. Instead, analyzing our assembled genomes, we identify species-specific composite repeats enriched mostly at one extremity of contigs. These repeats are G-rich, oriented, and transcribed, similarly to canonical telomeric repeats. We confirm them as telomeric using fluorescent in situ hybridization. These repeats are mostly found at one single end of chromosomes in these species. The discovery of unusual and specific complex telomeric repeats opens a plethora of perspectives and highlights the evolutionary diversity of telomeres despite their central roles in senescence, aging, and chromosome integrity.


2) M. javanica
The majority of the contigs had approximately the same coverage, and one contig identified with a high coverage was annotated as the mitochondrial genome of M. javanica and removed from the assembly.A low percentage of the contigs was mapped as Arthropoda (0.85%), but this can be the result of a lack of information on the NCBI database.

3) M. arenaria
One contig was identified with a high coverage (> 10 3 ) compared to the others.This contig corresponded to the mitochondrial genome of M. arenaria and was eliminated from the final assembly.No contamination was observed in the final assembly.

Supplementary Table 2: annotation of (retro)transposons and other repeats
The annotation of transposable elements and other repetitive regions on the genomes of M. incognita, M. arenaria, and M. javanica was performed using EDTA 8

Supplementary Figure 5 :A)
Subgenome assignment based on gene collinearity and Ks.Distribution of median Ks values between genes constituting triplicated (Minc) and 'quadruplicated' (Mjav, Mare) blocks, showing two peaks.In Minc, the lower Ks value represents the divergence between A-A' while the higher value represents the divergence between A-B.This property allowed determining which Minc contigs were A and which were B. Similarly, the lower Ks value represents the divergence between A-A' or B-B' while the higher value represents the divergence between As and Bs genomes in Mjav and Mare.B) Example of a topology obtained by computing a Euclidean distance based on the median Ks between conserved collinear blocks of Minc, Mjav, and Mare.Based on the position of A and B Minc contigs in the tree, the contigs of Mjav and Mare were assigned to either an A or B subgenome.C) Median Ks values for Minc, Mare, and Mjav sub-genome comparisons.D) Example of collinear blocks of at least 10 genes conserved between the 3 species and present in 3 copies in Minc while in 4 copies in Mjav and Mare, in total we could find 110 such conserved collinear blocks.Supplementary Figure 13: consensus of the enriched repeat at M. arenaria contig extremities AGnACCnTnTTGGACCCGGGGGGTCATTAGAGTACCGTGTCGGGGCCGAGGGACCATTAGAGTCGGTGCA GGAGAAGTTGTAGACGTCTGGCAGGAGCTGGCAGGAGCAGTTGAAGTTGTAGACGTCTGGCAGGAGCAG TTGAAGTTGTAGACGGnCCGTTGAAGTGCnGGAAGGGGGGGGGGGGnGGnnCAAATCTAAGGTCTAAGTG CCCTACTGTCTACACCTATTGAATAACAACCCGTGCCTTTGGATATAACTCGGAGGTATGAAAGCCTGAAC CTA (in italics: positions of the primers) Density of repeat patterns and G4-quadruplex along M. incognita contig extremities.The of Minc repeat units is represented by a color gradient with positive values (red) indicating a density in the sense strand and negative values (blue) indicating a density on the reverse complement strand.Gray triangles above bars indicate regions of the contigs where less than 100% of the repeat patterns are on the same strand.The heights of bars in the histogram represent the density of G4-quadruplex forming regions.The first and last 50kb of each contig containing at least 3 repeats are represented and the values are per 5kb windows on the genome.As a complement to Figure 3, a supplementary figure online represents the distribution of Minc repeats and G4s on the whole contig length in 100kb windows (https://doi.org/10.57745/1WDPE4).Source data to produce this figure is provided as a Source Data file.B) Density of repeat patterns and G4-quadruplex along M. javanica contig extremities.The density of Mjav repeat units is represented by a color gradient with positive values (red) indicating a density in the sense strand and negative values (blue) indicating a density on the reverse complement strand.Gray triangles above bars indicate regions of the contigs where less than 100% of the repeat patterns are on the same strand.The heights of bars in the histogram represent the density of G4-quadruplex forming regions.The first and last 50kb of each contig containing at least 3 repeats are represented and the values are per 5kb windows on the genome, contigs that do not contain repeats in their first or last 50kb are ignored.The distribution of Mjav repeats and G4s on the whole contig length in 100kb windows is available online at (https://doi.org/10.57745/EIKAQ4).Source data to produce this figure is provided as a Source Data file.B) Density of repeat patterns and G4-quadruplex along M. arenaria contig extremities.The density of Mare repeat units is represented by a color gradient with positive values (red) indicating a density in the sense strand and negative values (blue) indicating a density on the reverse complement strand.Gray triangles above bars indicate regions of the contigs where less than 100% of the repeat patterns are on the same strand.The heights of bars in the histogram represent the density of G4-quadruplex forming regions.The first and last 50kb of each contig containing at least 3 repeats are represented and the values are per 5 kb windows on the genome, contigs that do not contain repeats in their first or last 50 kb are ignored.The distribution of Mare repeats and G4s on the whole contig length in 100 kb windows is available online at (https://doi.org/10.57745/CY06YE).Source data to produce this figure is provided as a Source Data file.

Supplementary Table 1 :
Statistics of EuGene predictions